Lecture 7: August 21st, 2023#
Check-in Question: Happy Monday! How was your weekend? For those in SoCal, were you able to get through the storm okay? Reach out to Yasmeen if there are any issues as a result of the storm (e.g. power outage).
Updates and Reminders
EDA Outcome Quizzes:
Try last week’s EDA outcome quizzes by midnight tonight (if you haven’t already). I’ll unlock more attempts for those who are missing them after this first round closes.
We’re ever-so-slightly behind where I thought we’d be, but I’d still like to make this the EDA checkpoint week. What that means for us:
There are four (4) EDA outcomes we haven’t seen yet. I will release quizzes for these outcomes by the end of the day, and give everyone four (4) attempts for each.
What does “checkpoint week” mean? It simply means I won’t be opening new quizzes for EDA outcomes after this week. If you are missing an EDA outcome that you want after the attempts this week, reach out to me so we can see what you’re missing and work to fill in any gaps. I would then recommend submitting an SLO revision form.
Today:
I want to show you some really cool interactive features of Altair. We’ll spend some time working through examples, as well as leafing through the documentation.
The interactive Altair features is how I’d like to round-out our EDA Unit 3 material. EDA Unit 4 we’ll try and start today, and finish off on Wednesday. I expect EDA Unit 4 to be a little bit shorter (and less exciting) than our previous units. The point will be to review some important python concepts unrelated to data science, before we formally begin our ML Units (exciting!).
Optimist me wants to say we’ll start ML on Wednesday, but most likely we’ll really begin on Friday.
We’ll start today’s lecture by going through the section on Multi-view plots in Altair. Other than creating the distinct rows, treat the creation of the chart as a warm-up.
Multi-view plots in Altair#
import altair as alt
Extremely important for interactive portions: let’s make sure we’re on Altair version 5.0.0 (at least).
alt.__version__
'5.0.1'
import seaborn as sns
df = sns.load_dataset("mpg")
Make a facet chart using “horsepower” for the x-coordinate, “mpg” for the y-coordinate, “cylinders” for the color with the Nominal data encoding type, and dividing the data according to the number of cylinders. Put each chart in its own row.
df.head()
| mpg | cylinders | displacement | horsepower | weight | acceleration | model_year | origin | name | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 18.0 | 8 | 307.0 | 130.0 | 3504 | 12.0 | 70 | usa | chevrolet chevelle malibu |
| 1 | 15.0 | 8 | 350.0 | 165.0 | 3693 | 11.5 | 70 | usa | buick skylark 320 |
| 2 | 18.0 | 8 | 318.0 | 150.0 | 3436 | 11.0 | 70 | usa | plymouth satellite |
| 3 | 16.0 | 8 | 304.0 | 150.0 | 3433 | 12.0 | 70 | usa | amc rebel sst |
| 4 | 17.0 | 8 | 302.0 | 140.0 | 3449 | 10.5 | 70 | usa | ford torino |
#skeleton of chart without divisions
alt.Chart(df).mark_point().encode(
#default encoding is Q
x="horsepower",
y="mpg",
#:N means nominal encoding
color="cylinders:N",
shape="cylinders:N"
)
Recall: Altair recognizes these two types of categorical data: Nominal (without a natural ordering), and Ordinal (with a natural ordering).
Note: Just because data has a natural ordering, doesn’t mean we need to use it. Notice that the numer of cylinders is odered, but we can still tell Altair to treat it as Nominal.
#now with divisions along number of cylinders - vertical stacking
alt.Chart(df).mark_point().encode(
#default encoding is Q
x="horsepower",
y="mpg",
#:N means nominal encoding
color="cylinders:N",
shape="cylinders:N",
row="cylinders:N"
)
Question: What would be a benefit of stacking the data like this?
Brainstorming:
One benefit of stacking the data like this is that we can compare values from within a certain group (in thise case number of cylinders).
Notice that since we’ve stacked these charts vertically, it’s easy to compare the difference in horsepower between cylinders by drawing a vertical line from top to bottom.
What could I do if I wanted to stack the charts horizontally to compare mpg accross cylinders?
#now with divisions along number of cylinders - horizontal stacking
alt.Chart(df).mark_point().encode(
#default encoding is Q
x="horsepower",
y="mpg",
#:N means nominal encoding
color="cylinders:N",
shape="cylinders:N",
column="cylinders:N"
)
Interactive charts in Altair - mpg dataset#
Interactive charts are one of the most cool parts of Altair! We’ve already seen a little bit of interactivity by including a tooltip. Let’s see a few more examples today. We can get some inspiration by checking out this link. Today, we will incorporate ideas from the following:
Interactive rectangular brush
Selection Histogram
Import Altair and check that it is at least version 5.
alt.__version__
'5.0.1'
Import the mpg dataset from Seaborn and save it with the name
df.
df.head()
| mpg | cylinders | displacement | horsepower | weight | acceleration | model_year | origin | name | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 18.0 | 8 | 307.0 | 130.0 | 3504 | 12.0 | 70 | usa | chevrolet chevelle malibu |
| 1 | 15.0 | 8 | 350.0 | 165.0 | 3693 | 11.5 | 70 | usa | buick skylark 320 |
| 2 | 18.0 | 8 | 318.0 | 150.0 | 3436 | 11.0 | 70 | usa | plymouth satellite |
| 3 | 16.0 | 8 | 304.0 | 150.0 | 3433 | 12.0 | 70 | usa | amc rebel sst |
| 4 | 17.0 | 8 | 302.0 | 140.0 | 3449 | 10.5 | 70 | usa | ford torino |
Make a chart with “horsepower” along the x-axis, “weight” along the y-axis, and color/shape representing “origin”. Call this chart
c1.
c1 = alt.Chart(df).mark_point().encode(
x="horsepower",
y="weight",
color="origin:N",
shape="origin:N"
)
c1
Following the Interactive Rectangular Brush example, add a selection interval to
c1.
brush = alt.selection_interval()
c1 = alt.Chart(df).mark_point().encode(
x="horsepower",
y="weight",
color="origin:N",
shape="origin:N"
).add_params(brush)
c1
The chart looks exactly the same so far! How do I check? Drag a region over the chart :)
Yay! It’s working! But…doesn’t really do anything yet…
Spoiler: The purpose of this square will be to pass data to another chart.
Again using the documentation, make the selection so that it grays out everything that’s not in the box.
We’re going to use alt.condition() for the color. This tells us what to do if a condition is true, and what to do if a condition is false. This should remind you of np.where from NumPy.
brush = alt.selection_interval()
c1 = alt.Chart(df).mark_point().encode(
x="horsepower",
y="weight",
color=alt.condition(brush,"origin:N", alt.value('grey')),
shape="origin:N"
).add_params(brush)
c1
What if I wanted to specify the color scheme for “origin” as well?
brush = alt.selection_interval()
c1 = alt.Chart(df).mark_point().encode(
x="horsepower",
y="weight",
color=alt.condition(brush, alt.Color("origin:N").scale(scheme="turbo"), alt.value('grey')),
shape="origin:N"
).add_params(brush)
c1
alt.Color().scale() allows us to access properties of Color (in this case, the scheme).
Motivation: We’ll now construct a bar chart that depends on our selection from c1.
Make a bar chart with “origin” along the x-axis, and “count()” along the y-axis. Call this chart
c2.
df.columns
Index(['mpg', 'cylinders', 'displacement', 'horsepower', 'weight',
'acceleration', 'model_year', 'origin', 'name'],
dtype='object')
Recall: "count()" is a property of Altair. In this case, it will count all of the entries corresponding to each place of origin.
c2 = alt.Chart(df).mark_bar().encode(
x="origin:N",
y="count()"
)
c2
How could I check these values using value_counts()?
df["origin"].value_counts()
usa 249
japan 79
europe 70
Name: origin, dtype: int64
Question from the chat: Can we swap xy-axes here?
c3 = alt.Chart(df).mark_bar().encode(
x="count()",
y="origin:N"
).transform_filter(brush)
Following the example from Selection Histogram, use
.transform_filter(brush)to tell Altair to changec2depending on our selection fromc1.
c2 = alt.Chart(df).mark_bar().encode(
x="origin:N",
y="count()"
).transform_filter(brush)
#If we try to view c2 now, there will be an error, because there's no selection from c1 yet.
Display
c1andc2side-by-side by callingc1 | c2.
c1 | c2
c1 & c3
Our charts should be looking pretty good right now, but there are a few things I think we could improve:
Fix the y-axis on
c2so that it’s not changing with every new selection.Similar for the x-axis.
From looking at value_count() above with the origin, I know that the maximum number that appears is 250. This will influence how I set the domain for the y-axis.
c2 = alt.Chart(df).mark_bar().encode(
x=alt.X("origin:N").scale(domain=df["origin"].unique()),
y=alt.Y("count()").scale(domain=(0,250))
).transform_filter(brush)
c1 | c2
Interactive charts in Altair: Spotify dataset#
Import the attached Spotify dataset as
df. In this csv file, missing values are denoted by a blank space. Use thena_valueskeyword argument withpd.read_csvso that those blank spaces get converted tonp.nan.
import pandas as pd
df = pd.read_csv("spotify.csv", na_values=" ")
df.head()
| Index | Highest Charting Position | Number of Times Charted | Week of Highest Charting | Song Name | Streams | Artist | Artist Followers | Song ID | Genre | ... | Danceability | Energy | Loudness | Speechiness | Acousticness | Liveness | Tempo | Duration (ms) | Valence | Chord | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 1 | 8 | 2021-07-23--2021-07-30 | Beggin' | 48,633,449 | Måneskin | 3377762.0 | 3Wrjm47oTz2sjIgck11l5e | ['indie rock italiano', 'italian pop'] | ... | 0.714 | 0.800 | -4.808 | 0.0504 | 0.1270 | 0.3590 | 134.002 | 211560.0 | 0.589 | B |
| 1 | 2 | 2 | 3 | 2021-07-23--2021-07-30 | STAY (with Justin Bieber) | 47,248,719 | The Kid LAROI | 2230022.0 | 5HCyWlXZPP0y6Gqq8TgA20 | ['australian hip hop'] | ... | 0.591 | 0.764 | -5.484 | 0.0483 | 0.0383 | 0.1030 | 169.928 | 141806.0 | 0.478 | C#/Db |
| 2 | 3 | 1 | 11 | 2021-06-25--2021-07-02 | good 4 u | 40,162,559 | Olivia Rodrigo | 6266514.0 | 4ZtFanR9U6ndgddUvNcjcG | ['pop'] | ... | 0.563 | 0.664 | -5.044 | 0.1540 | 0.3350 | 0.0849 | 166.928 | 178147.0 | 0.688 | A |
| 3 | 4 | 3 | 5 | 2021-07-02--2021-07-09 | Bad Habits | 37,799,456 | Ed Sheeran | 83293380.0 | 6PQ88X9TkUIAUIZJHW2upE | ['pop', 'uk pop'] | ... | 0.808 | 0.897 | -3.712 | 0.0348 | 0.0469 | 0.3640 | 126.026 | 231041.0 | 0.591 | B |
| 4 | 5 | 5 | 1 | 2021-07-23--2021-07-30 | INDUSTRY BABY (feat. Jack Harlow) | 33,948,454 | Lil Nas X | 5473565.0 | 27NovPIUIRrOZoCHxABJwK | ['lgbtq+ hip hop', 'pop rap'] | ... | 0.736 | 0.704 | -7.409 | 0.0615 | 0.0203 | 0.0501 | 149.995 | 212000.0 | 0.894 | D#/Eb |
5 rows × 23 columns
Check your work by evaluating
value_countsondf.dtypes. If everything worked correctly, there should be 11 float columns, 3 integer columns, and 9 object columns.
df.dtypes.value_counts()
float64 11
object 9
int64 3
dtype: int64
Plot the data from
dfusing Altair. Encode the “Acousticness” data as the x-coordinate, the “Energy” data as the y-coordinate, and encode the “Valence” data as the color.
df.columns
Index(['Index', 'Highest Charting Position', 'Number of Times Charted',
'Week of Highest Charting', 'Song Name', 'Streams', 'Artist',
'Artist Followers', 'Song ID', 'Genre', 'Release Date', 'Weeks Charted',
'Popularity', 'Danceability', 'Energy', 'Loudness', 'Speechiness',
'Acousticness', 'Liveness', 'Tempo', 'Duration (ms)', 'Valence',
'Chord'],
dtype='object')
df.shape
(1556, 23)
alt.Chart(df).mark_circle().encode(
x="Acousticness",
y="Energy",
color="Valence"
)
Adjust the color scheme used to the dark2 color scheme.
alt.Chart(df).mark_circle().encode(
x="Acousticness",
y="Energy",
color=alt.Color("Valence").scale(scheme="dark2")
)
This color scheme still doesn’t look too good to me (some colors look very similar, even though they are on completely different sides of the spectrum)…let’s find a better one.
Change the color scheme from dark2 to a different one of these options (scroll down to find the options).
alt.Chart(df).mark_circle().encode(
x="Acousticness",
y="Energy",
color=alt.Color("Valence").scale(scheme="spectral")
)
Add a tooltip to the chart, indicating the Artist name and the song name.
alt.Chart(df).mark_circle().encode(
x="Acousticness",
y="Energy",
color=alt.Color("Valence").scale(scheme="spectral"),
tooltip=["Artist","Song Name"]
)
A chart with the 50 most frequently occurring artists#
Define a new variable
scontaining the pandas Series corresponding to the “Artist” column indf.
s = df["Artist"]
Call the
value_countsmethod ons.
s.value_counts()
Taylor Swift 52
Lil Uzi Vert 32
Justin Bieber 32
Juice WRLD 30
Pop Smoke 29
..
Chris Brown, Young Thug 1
Rauw Alejandro, J Balvin 1
347aidan 1
Migrantes, Alico 1
Dadá Boladão, Tati Zaqui, OIK 1
Name: Artist, Length: 716, dtype: int64
Using the previous result, find the 50 most frequently occurring artists in this dataset. (Note: the
value_countsmethod automatically sorts the results from most frequent to least frequent.)
s.value_counts()[:50]
Taylor Swift 52
Lil Uzi Vert 32
Justin Bieber 32
Juice WRLD 30
Pop Smoke 29
BTS 29
Bad Bunny 28
Eminem 22
The Weeknd 21
Ariana Grande 20
Drake 19
Billie Eilish 18
Selena Gomez 17
J. Cole 16
Doja Cat 16
Dua Lipa 15
Lady Gaga 14
Tyler, The Creator 14
DaBaby 14
21 Savage, Metro Boomin 12
Olivia Rodrigo 12
Kid Cudi 12
Mac Miller 11
Polo G 11
Lil Baby 10
Post Malone 10
Sam Smith 9
BLACKPINK 9
The Kid LAROI 9
J Balvin 9
Travis Scott 9
Ed Sheeran 9
Joji 8
Apache 207 7
XXXTENTACION 7
Morgan Wallen 7
Megan Thee Stallion 6
Maluma 6
Lil Nas X 6
Ava Max 6
Miley Cyrus 6
Machine Gun Kelly 6
Rauw Alejandro 6
Bonez MC 6
Migos 6
5 Seconds of Summer 5
Anuel AA 5
Shawn Mendes 5
Lauv 5
Jonas Brothers 5
Name: Artist, dtype: int64
Define a variable
top_artistswhich contains these top 50 artists. (Hint. You might want to use the index attribute.)
I want to get the Artist names out of this series…
type(s.value_counts()[:50])
pandas.core.series.Series
Since the artist names are the indices of this series, we could use the index attribute of a pandas Series object.
top_artists = s.value_counts()[:50].index
top_artists
Index(['Taylor Swift', 'Lil Uzi Vert', 'Justin Bieber', 'Juice WRLD',
'Pop Smoke', 'BTS', 'Bad Bunny', 'Eminem', 'The Weeknd',
'Ariana Grande', 'Drake', 'Billie Eilish', 'Selena Gomez', 'J. Cole',
'Doja Cat', 'Dua Lipa', 'Lady Gaga', 'Tyler, The Creator', 'DaBaby',
'21 Savage, Metro Boomin', 'Olivia Rodrigo', 'Kid Cudi', 'Mac Miller',
'Polo G', 'Lil Baby', 'Post Malone', 'Sam Smith', 'BLACKPINK',
'The Kid LAROI', 'J Balvin', 'Travis Scott', 'Ed Sheeran', 'Joji',
'Apache 207', 'XXXTENTACION', 'Morgan Wallen', 'Megan Thee Stallion',
'Maluma', 'Lil Nas X', 'Ava Max', 'Miley Cyrus', 'Machine Gun Kelly',
'Rauw Alejandro', 'Bonez MC', 'Migos', '5 Seconds of Summer',
'Anuel AA', 'Shawn Mendes', 'Lauv', 'Jonas Brothers'],
dtype='object')
Question from the chat: Could we try casting this to a list?
Notice this returns the values, not the indices….
list(s.value_counts()[:50])
[52,
32,
32,
30,
29,
29,
28,
22,
21,
20,
19,
18,
17,
16,
16,
15,
14,
14,
14,
12,
12,
12,
11,
11,
10,
10,
9,
9,
9,
9,
9,
9,
8,
7,
7,
7,
6,
6,
6,
6,
6,
6,
6,
6,
6,
5,
5,
5,
5,
5]
But! We could try converting to a dictionary
dict(s.value_counts()[:50]).keys()
dict_keys(['Taylor Swift', 'Lil Uzi Vert', 'Justin Bieber', 'Juice WRLD', 'Pop Smoke', 'BTS', 'Bad Bunny', 'Eminem', 'The Weeknd', 'Ariana Grande', 'Drake', 'Billie Eilish', 'Selena Gomez', 'J. Cole', 'Doja Cat', 'Dua Lipa', 'Lady Gaga', 'Tyler, The Creator', 'DaBaby', '21 Savage, Metro Boomin', 'Olivia Rodrigo', 'Kid Cudi', 'Mac Miller', 'Polo G', 'Lil Baby', 'Post Malone', 'Sam Smith', 'BLACKPINK', 'The Kid LAROI', 'J Balvin', 'Travis Scott', 'Ed Sheeran', 'Joji', 'Apache 207', 'XXXTENTACION', 'Morgan Wallen', 'Megan Thee Stallion', 'Maluma', 'Lil Nas X', 'Ava Max', 'Miley Cyrus', 'Machine Gun Kelly', 'Rauw Alejandro', 'Bonez MC', 'Migos', '5 Seconds of Summer', 'Anuel AA', 'Shawn Mendes', 'Lauv', 'Jonas Brothers'])
(More difficult.) Use the
isinmethod (documentation) and Boolean indexing to define a new pandas DataFramedf2which is the sub-DataFrame ofdfcontaining only the 50 most frequently occurring artists.
df2 = df[df["Artist"].isin(top_artists)]
df2
| Index | Highest Charting Position | Number of Times Charted | Week of Highest Charting | Song Name | Streams | Artist | Artist Followers | Song ID | Genre | ... | Danceability | Energy | Loudness | Speechiness | Acousticness | Liveness | Tempo | Duration (ms) | Valence | Chord | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 2 | 3 | 2021-07-23--2021-07-30 | STAY (with Justin Bieber) | 47,248,719 | The Kid LAROI | 2230022.0 | 5HCyWlXZPP0y6Gqq8TgA20 | ['australian hip hop'] | ... | 0.591 | 0.764 | -5.484 | 0.0483 | 0.03830 | 0.1030 | 169.928 | 141806.0 | 0.478 | C#/Db |
| 2 | 3 | 1 | 11 | 2021-06-25--2021-07-02 | good 4 u | 40,162,559 | Olivia Rodrigo | 6266514.0 | 4ZtFanR9U6ndgddUvNcjcG | ['pop'] | ... | 0.563 | 0.664 | -5.044 | 0.1540 | 0.33500 | 0.0849 | 166.928 | 178147.0 | 0.688 | A |
| 3 | 4 | 3 | 5 | 2021-07-02--2021-07-09 | Bad Habits | 37,799,456 | Ed Sheeran | 83293380.0 | 6PQ88X9TkUIAUIZJHW2upE | ['pop', 'uk pop'] | ... | 0.808 | 0.897 | -3.712 | 0.0348 | 0.04690 | 0.3640 | 126.026 | 231041.0 | 0.591 | B |
| 4 | 5 | 5 | 1 | 2021-07-23--2021-07-30 | INDUSTRY BABY (feat. Jack Harlow) | 33,948,454 | Lil Nas X | 5473565.0 | 27NovPIUIRrOZoCHxABJwK | ['lgbtq+ hip hop', 'pop rap'] | ... | 0.736 | 0.704 | -7.409 | 0.0615 | 0.02030 | 0.0501 | 149.995 | 212000.0 | 0.894 | D#/Eb |
| 5 | 6 | 1 | 18 | 2021-05-07--2021-05-14 | MONTERO (Call Me By Your Name) | 30,071,134 | Lil Nas X | 5473565.0 | 67BtfxlNbhBmCDR2L2l8qd | ['lgbtq+ hip hop', 'pop rap'] | ... | 0.610 | 0.508 | -6.682 | 0.1520 | 0.29700 | 0.3840 | 178.818 | 137876.0 | 0.758 | G#/Ab |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 1545 | 1546 | 128 | 1 | 2019-12-27--2020-01-03 | Candy | 5,632,102 | Doja Cat | 8671649.0 | 1VJwtWR6z7SpZRwipI12be | ['dance pop', 'pop'] | ... | 0.689 | 0.516 | -5.857 | 0.0444 | 0.51300 | 0.1630 | 124.876 | 190920.0 | 0.209 | G#/Ab |
| 1548 | 1549 | 178 | 1 | 2019-12-27--2020-01-03 | Old Town Road | 4,852,004 | Lil Nas X | 5488666.0 | 2YpeDb67231RjR0MgVLzsG | ['lgbtq+ hip hop', 'pop rap'] | ... | 0.878 | 0.619 | -5.560 | 0.1020 | 0.05330 | 0.1130 | 136.041 | 157067.0 | 0.639 | F#/Gb |
| 1549 | 1550 | 187 | 1 | 2019-12-27--2020-01-03 | Let Me Know (I Wonder Why Freestyle) | 4,701,532 | Juice WRLD | 19102888.0 | 3wwo0bJvDSorOpNfzEkfXx | ['chicago rap', 'melodic rap'] | ... | 0.635 | 0.537 | -7.895 | 0.0832 | 0.17200 | 0.4180 | 125.028 | 215381.0 | 0.383 | G |
| 1551 | 1552 | 195 | 1 | 2019-12-27--2020-01-03 | New Rules | 4,630,675 | Dua Lipa | 27167675.0 | 2ekn2ttSfGqwhhate0LSR0 | ['dance pop', 'pop', 'uk pop'] | ... | 0.762 | 0.700 | -6.021 | 0.0694 | 0.00261 | 0.1530 | 116.073 | 209320.0 | 0.608 | A |
| 1555 | 1556 | 199 | 1 | 2019-12-27--2020-01-03 | Lover (Remix) [feat. Shawn Mendes] | 4,595,450 | Taylor Swift | 42227614.0 | 3i9UVldZOE0aD0JnyfAZZ0 | ['pop', 'post-teen pop'] | ... | 0.448 | 0.603 | -7.176 | 0.0640 | 0.43300 | 0.0862 | 205.272 | 221307.0 | 0.422 | G |
678 rows × 23 columns
Check your answer: the shape of
df2should be 678 by 23.
df2.shape
(678, 23)
Interactive Altair Chart#
Here, we create an interactive chart to go along with df2 that we just made above.
Make the same chart as you made above, with the only difference being, that you now use
df2instead ofdffor the data.
alt.Chart(df2).mark_circle().encode(
x="Acousticness",
y="Energy",
color=alt.Color("Valence").scale(scheme="spectral"),
tooltip=["Artist","Song Name"]
)
Add a
selection_intervalobject named brush to the chart.
brush = alt.selection_interval()
alt.Chart(df2).mark_circle().encode(
x="Acousticness",
y="Energy",
color=alt.Color("Valence").scale(scheme="spectral"),
tooltip=["Artist","Song Name"]
).add_params(brush)
Assign this chart to the variable name
c1using the codec1 = alt.Chart....
brush = alt.selection_interval()
c1 = alt.Chart(df2).mark_circle().encode(
x="Acousticness",
y="Energy",
color=alt.Color("Valence").scale(scheme="spectral"),
tooltip=["Artist","Song Name"]
).add_params(brush)
Display this chart by evaluating
c1.
c1
Check your work: if you click and drag on the chart, there should be a grey rectangle that appears. (Once you’ve displayed the grey rectangle, you can move it around.)
Make a second chart c2 showing a bar chart for the selected data as in the previous part of lecture. The x-axis should correspond to Artist names (only the top 50 since we’re using df2) and the y-axis should correspond to the number of times those artists appear in the selection. (Use transform_filter with brush, as in the above notes.)
c2 = alt.Chart(df2).mark_bar().encode(
x="Artist",
y="count():N"
).transform_filter(brush)
Display
c1andc2, one after the other, usingc1&c2. (If you instead want them to appear side-by-side, you can usec1|c2.)
c1&c2
Find an image you like (including the selection) and save it using the … “Save as PNG” from the top right of the Deepnote cell with the two charts.
Upload that file to this Deepnote project, and embed that png file in a markdown cell. The syntax is
.
